Skip to main content
Glama

basic-memory

SPEC-14- Cloud Git Versioning & GitHub Backup.md9.58 kB
--- title: 'SPEC-14: Cloud Git Versioning & GitHub Backup' type: spec permalink: specs/spec-14-cloud-git-versioning tags: - git - github - backup - versioning - cloud related: - specs/spec-9-multi-project-bisync - specs/spec-9-follow-ups-conflict-sync-and-observability status: deferred --- # SPEC-14: Cloud Git Versioning & GitHub Backup **Status: DEFERRED** - Postponed until multi-user/teams feature development. Using S3 versioning (SPEC-9.1) for v1 instead. ## Why Deferred **Original goals can be met with simpler solutions:** - Version history → **S3 bucket versioning** (automatic, zero config) - Offsite backup → **Tigris global replication** (built-in) - Restore capability → **S3 version restore** (`bm cloud restore --version-id`) - Collaboration → **Deferred to teams/multi-user feature** (not v1 requirement) **Complexity vs value trade-off:** - Git integration adds: committer service, puller service, webhooks, LFS, merge conflicts - Risk: Loop detection between Git ↔ rclone bisync ↔ local edits - S3 versioning gives 80% of value with 5% of complexity **When to revisit:** - Teams/multi-user features (PR-based collaboration workflow) - User requests for commit messages and branch-based workflows - Need for fine-grained audit trail beyond S3 object metadata --- ## Original Specification (for reference) ## Why Early access users want **transparent version history**, easy **offsite backup**, and a familiar **restore/branching** workflow. Git/GitHub integration would provide: - Auditable history of every change (who/when/why) - Branches/PRs for review and collaboration - Offsite private backup under the user's control - Escape hatch: users can always `git clone` their knowledge base **Note:** These goals are now addressed via S3 versioning (SPEC-9.1) for single-user use case. ## Goals - **Transparent**: Users keep using Basic Memory; Git runs behind the scenes. - **Private**: Push to a **private GitHub repo** that the user owns (or tenant org). - **Reliable**: No data loss, deterministic mapping of filesystem ↔ Git. - **Composable**: Plays nicely with SPEC‑9 bisync and upcoming conflict features (SPEC‑9 Follow‑Ups). **Non‑Goals (for v1):** - Fine‑grained per‑file encryption in Git history (can be layered later). - Large media optimization beyond Git LFS defaults. ## User Stories 1. *As a user*, I connect my GitHub and choose a private backup repo. 2. *As a user*, every change I make in cloud (or via bisync) is **committed** and **pushed** automatically. 3. *As a user*, I can **restore** a file/folder/project to a prior version. 4. *As a power user*, I can **git pull/push** directly to collaborate outside the app. 5. *As an admin*, I can enforce repo ownership (tenant org) and least‑privilege scopes. ## Scope - **In scope:** Full repo backup of `/app/data/` (all projects) with optional selective subpaths. - **Out of scope (v1):** Partial shallow mirrors; encrypted Git; cross‑provider SCM (GitLab/Bitbucket). ## Architecture ### Topology - **Authoritative working tree**: `/app/data/` (bucket mount) remains the source of truth (SPEC‑9). - **Bare repo** lives alongside: `/app/git/${tenant}/knowledge.git` (server‑side). - **Mirror remote**: `github.com/<owner>/<repo>.git` (private). ```mermaid flowchart LR A[/Users & Agents/] -->|writes/edits| B[/app/data/] B -->|file events| C[Committer Service] C -->|git commit| D[(Bare Repo)] D -->|push| E[(GitHub Private Repo)] E -->|webhook (push)| F[Puller Service] F -->|git pull/merge| D D -->|checkout/merge| B ``` ### Services - **Committer Service** (daemon): - Watches `/app/data/` for changes (inotify/poll) - Batches changes (debounce e.g. 2–5s) - Writes `.bmmeta` (if present) into commit message trailer (see Follow‑Ups) - `git add -A && git commit -m "chore(sync): <summary> BM-Meta: <json>"` - Periodic `git push` to GitHub mirror (configurable interval) - **Puller Service** (webhook target): - Receives GitHub webhook (push) → `git fetch` - **Fast‑forward** merges to `main` only; reject non‑FF unless policy allows - Applies changes back to `/app/data/` via clean checkout - Emits sync events for Basic Memory indexers ### Auth & Security - **GitHub App** (recommended): minimal scopes: `contents:read/write`, `metadata:read`, webhook. - Tenant‑scoped installation; repo created in user account or tenant org. - Tokens stored in KMS/secret manager; rotated automatically. - Optional policy: allow only **FF merges** on `main`; non‑FF requires PR. ### Repo Layout - **Monorepo** (default): one repo per tenant mirrors `/app/data/` with subfolders per project. - Optional multi‑repo mode (later): one repo per project. ### File Handling - Honor `.gitignore` generated from `.bmignore.rclone` + BM defaults (cache, temp, state). - **Git LFS** for large binaries (images, media) — auto track by extension/size threshold. - Normalize newline + Unicode (aligns with Follow‑Ups). ### Conflict Model - **Primary concurrency**: SPEC‑9 Follow‑Ups (`.bmmeta`, conflict copies) stays the first line of defense. - **Git merges** are a **secondary** mechanism: - Server only auto‑merges **text** conflicts when trivial (FF or clean 3‑way). - Otherwise, create `name (conflict from <branch>, <ts>).md` and surface via events. ### Data Flow vs Bisync - Bisync (rclone) continues between local sync dir ↔ bucket. - Git sits **cloud‑side** between bucket and GitHub. - On **pull** from GitHub → files written to `/app/data/` → picked up by indexers & eventually by bisync back to users. ## CLI & UX New commands (cloud mode): - `bm cloud git connect` — Launch GitHub App installation; create private repo; store installation id. - `bm cloud git status` — Show connected repo, last push time, last webhook delivery, pending commits. - `bm cloud git push` — Manual push (rarely needed). - `bm cloud git pull` — Manual pull/FF (admin only by default). - `bm cloud snapshot -m "message"` — Create a tagged point‑in‑time snapshot (git tag). - `bm restore <path> --to <commit|tag>` — Restore file/folder/project to prior version. Settings: - `bm config set git.autoPushInterval=5s` - `bm config set git.lfs.sizeThreshold=10MB` - `bm config set git.allowNonFF=false` ## Migration & Backfill - On connect, if repo empty: initial commit of entire `/app/data/`. - If repo has content: require **one‑time import** path (clone to staging, reconcile, choose direction). ## Edge Cases - Massive deletes: gated by SPEC‑9 `max_delete` **and** Git pre‑push hook checks. - Case changes and rename detection: rely on git rename heuristics + Follow‑Ups move hints. - Secrets: default ignore common secret patterns; allow custom deny list. ## Telemetry & Observability - Emit `git_commit`, `git_push`, `git_pull`, `git_conflict` events with correlation IDs. - `bm sync --report` extended with Git stats (commit count, delta bytes, push latency). ## Phased Plan ### Phase 0 — Prototype (1 sprint) - Server: bare repo init + simple committer (batch every 10s) + manual GitHub token. - CLI: `bm cloud git connect --token <PAT>` (dev‑only) - Success: edits in `/app/data/` appear in GitHub within 30s. ### Phase 1 — GitHub App & Webhooks (1–2 sprints) - Switch to GitHub App installs; create private repo; store installation id. - Committer hardened (debounce 2–5s, backoff, retries). - Puller service with webhook → FF merge → checkout to `/app/data/`. - LFS auto‑track + `.gitignore` generation. - CLI surfaces status + logs. ### Phase 2 — Restore & Snapshots (1 sprint) - `bm restore` for file/folder/project with dry‑run. - `bm cloud snapshot` tags + list/inspect. - Policy: PR‑only non‑FF, admin override. ### Phase 3 — Selective & Multi‑Repo (nice‑to‑have) - Include/exclude projects; optional per‑project repos. - Advanced policies (branch protections, required reviews). ## Acceptance Criteria - Changes to `/app/data/` are committed and pushed automatically within configurable interval (default ≤5s). - GitHub webhook pull results in updated files in `/app/data/` (FF‑only by default). - LFS configured and functioning; large files don't bloat history. - `bm cloud git status` shows connected repo and last push/pull times. - `bm restore` restores a file/folder to a prior commit with a clear audit trail. - End‑to‑end works alongside SPEC‑9 bisync without loops or data loss. ## Risks & Mitigations - **Loop risk (Git ↔ Bisync)**: Writes to `/app/data/` → bisync → local → user edits → back again. *Mitigation*: Debounce, commit squashing, idempotent `.bmmeta` versioning, and watch exclusion windows during pull. - **Repo bloat**: Lots of binary churn. *Mitigation*: default LFS, size threshold, optional media‑only repo later. - **Security**: Token leakage. *Mitigation*: GitHub App with short‑lived tokens, KMS storage, scoped permissions. - **Merge complexity**: Non‑trivial conflicts. *Mitigation*: prefer FF; otherwise conflict copies + events; require PR for non‑FF. ## Open Questions - Do we default to **monorepo** per tenant, or offer project‑per‑repo at connect time? - Should `restore` write to a branch and open a PR, or directly modify `main`? - How do we expose Git history in UI (timeline view) without users dropping to CLI? ## Appendix: Sample Config ```json { "git": { "enabled": true, "repo": "https://github.com/<owner>/<repo>.git", "autoPushInterval": "5s", "allowNonFF": false, "lfs": { "sizeThreshold": 10485760 } } } ```

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/basicmachines-co/basic-memory'

If you have feedback or need assistance with the MCP directory API, please join our Discord server